{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Executing Quantum Circuits \n",
    "\n",
    "In CUDA-Q, there are 4 ways in which one can execute quantum kernels: \n",
    "\n",
    "1. `sample`: yields measurement counts\n",
    "2. `run`: yields individual return values from multiple executions\n",
    "2. `observe`: yields expectation values \n",
    "3. `get_state`: yields the quantum statevector of the computation\n",
    "\n",
    "Asynchronous programming is a technique that enables your program to start a potentially long-running task and still be able to be responsive to other events while that task runs, rather than having to wait until that task has finished. Once that task has finished, your program is presented with the result. The most intensive task in the computation is the execution of the quantum kernel hence each execution function can be parallelized given access to multiple quantum processing units (multi-QPU) using: `sample_async`, `run_async`, `observe_async` and `get_state_async`.\n",
    "Since multi-QPU platforms are not yet feasible, we emulate each QPU with a GPU.\n",
    "\n",
    "## Sample\n",
    "\n",
    "Quantum states collapse upon measurement and hence need to be sampled many times to gather statistics. The CUDA-Q `sample` call enables this: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "     ╭───╮     \n",
      "q0 : ┤ h ├──●──\n",
      "     ╰───╯╭─┴─╮\n",
      "q1 : ─────┤ x ├\n",
      "          ╰───╯\n",
      "\n",
      "{ 00:492 11:508 }\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import cudaq\n",
    "import numpy as np \n",
    "\n",
    "qubit_count = 2\n",
    "\n",
    "# Define the simulation target.\n",
    "cudaq.set_target(\"qpp-cpu\")\n",
    "\n",
    "# Define a quantum kernel function.\n",
    "\n",
    "@cudaq.kernel\n",
    "def kernel(qubit_count: int):\n",
    "    qvector = cudaq.qvector(qubit_count)\n",
    "\n",
    "    # 2-qubit GHZ state.\n",
    "    h(qvector[0])\n",
    "    for i in range(1, qubit_count):\n",
    "        x.ctrl(qvector[0], qvector[i])\n",
    "\n",
    "    # If we dont specify measurements, all qubits are measured in\n",
    "    # the Z-basis by default or we can manually specify it also \n",
    "    # mz(qvector)\n",
    "\n",
    "\n",
    "print(cudaq.draw(kernel, qubit_count))\n",
    "\n",
    "result = cudaq.sample(kernel, qubit_count, shots_count=1000)\n",
    "\n",
    "print(result)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that there is a subtle difference between how `sample` is executed with the target device set to a simulator or with the target device set to a QPU. In simulation mode, the quantum state is built once and then sampled $s$ times where $s$ equals the `shots_count`. In hardware execution mode, the quantum state collapses upon measurement and hence needs to be rebuilt over and over again.\n",
    "\n",
    "There are a number of helpful tools that can be found in the [API docs](https://nvidia.github.io/cuda-quantum/latest/api/languages/python_api) to process the `Sample_Result` object produced by `sample`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sample Async\n",
    "\n",
    "`sample` also supports asynchronous execution for the [arguments it accepts](https://nvidia.github.io/cuda-quantum/latest/api/languages/python_api.html#cudaq.sample_async). One can parallelize over various kernels, variational parameters or even distribute shots counts over multiple QPUs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Run\n",
    "\n",
    "The `run` API executes a quantum kernel multiple times and returns each individual result. Unlike `sample`, which collects measurement statistics as counts, `run` preserves each individual return value from every execution. This is useful when you need to analyze the distribution of returned values rather than just aggregated measurement counts.\n",
    "\n",
    "Key points about `run`:\n",
    "\n",
    " - Requires a kernel that returns a non-void value\n",
    " - Returns a list containing all individual execution results\n",
    " - Supports scalar types (bool, int, float) and custom data classes as return types\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Executed 20 shots\n",
      "Results: [0, 0, 0, 0, 0, 3, 3, 3, 0, 0, 3, 0, 0, 3, 3, 3, 0, 0, 0, 0]\n",
      "Possible values: Either 0 or 3 due to GHZ state properties\n",
      "\n",
      "Counts of each result:\n",
      "0: 13 times\n",
      "3: 7 times\n"
     ]
    }
   ],
   "source": [
    "import cudaq\n",
    "from dataclasses import dataclass\n",
    "\n",
    "# Define the simulation target\n",
    "cudaq.set_target(\"qpp-cpu\")\n",
    "\n",
    "\n",
    "# Define a quantum kernel that returns an integer\n",
    "@cudaq.kernel\n",
    "def simple_ghz(num_qubits: int) -> int:\n",
    "    # Allocate qubits\n",
    "    qubits = cudaq.qvector(num_qubits)\n",
    "\n",
    "    # Create GHZ state\n",
    "    h(qubits[0])\n",
    "    for i in range(1, num_qubits):\n",
    "        x.ctrl(qubits[0], qubits[i])\n",
    "\n",
    "    # Measure and return total number of qubits in state |1⟩\n",
    "    res = 0\n",
    "    for i in range(num_qubits):\n",
    "        if mz(qubits[i]):\n",
    "            res += 1\n",
    "\n",
    "    return res\n",
    "\n",
    "\n",
    "# Execute the kernel 20 times\n",
    "num_qubits = 3\n",
    "results = cudaq.run(simple_ghz, num_qubits, shots_count=20)\n",
    "\n",
    "print(f\"Executed {len(results)} shots\")\n",
    "print(f\"Results: {results}\")\n",
    "print(f\"Possible values: Either 0 or {num_qubits} due to GHZ state properties\")\n",
    "\n",
    "# Count occurrences of each result\n",
    "value_counts = {}\n",
    "for value in results:\n",
    "    value_counts[value] = value_counts.get(value, 0) + 1\n",
    "\n",
    "print(\"\\nCounts of each result:\")\n",
    "for value, count in value_counts.items():\n",
    "    print(f\"{value}: {count} times\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Return Custom Data Types\n",
    "\n",
    "The `run` API also supports returning custom data types using Python's data classes. This allows returning multiple values from your quantum computation in a structured way."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Individual measurement results:\n",
      "Shot 0: {True, True}\ttotal ones=2\n",
      "Shot 1: {True, True}\ttotal ones=2\n",
      "Shot 2: {True, True}\ttotal ones=2\n",
      "Shot 3: {False, False}\ttotal ones=0\n",
      "Shot 4: {False, False}\ttotal ones=0\n",
      "Shot 5: {True, True}\ttotal ones=2\n",
      "Shot 6: {False, False}\ttotal ones=0\n",
      "Shot 7: {False, False}\ttotal ones=0\n",
      "Shot 8: {True, True}\ttotal ones=2\n",
      "Shot 9: {True, True}\ttotal ones=2\n",
      "\n",
      "Correlated measurements: 10/10 (100.0%)\n"
     ]
    }
   ],
   "source": [
    "import cudaq\n",
    "\n",
    "from dataclasses import dataclass\n",
    "\n",
    "\n",
    "# Define a custom dataclass to return from our quantum kernel\n",
    "@dataclass(slots=True)\n",
    "class MeasurementResult:\n",
    "    first_qubit: bool\n",
    "    last_qubit: bool\n",
    "    total_ones: int\n",
    "\n",
    "\n",
    "@cudaq.kernel\n",
    "def bell_pair_with_data() -> MeasurementResult:\n",
    "    # Create a bell pair\n",
    "    qubits = cudaq.qvector(2)\n",
    "    h(qubits[0])\n",
    "    x.ctrl(qubits[0], qubits[1])\n",
    "\n",
    "    # Measure both qubits\n",
    "    first_result = mz(qubits[0])\n",
    "    last_result = mz(qubits[1])\n",
    "\n",
    "    # Return custom data structure with results\n",
    "    total = 0\n",
    "    if first_result:\n",
    "        total = 1\n",
    "    if last_result:\n",
    "        total = total + 1\n",
    "\n",
    "    return MeasurementResult(first_result, last_result, total)\n",
    "\n",
    "\n",
    "# Run the kernel 10 times and get all results\n",
    "results = cudaq.run(bell_pair_with_data, shots_count=10)\n",
    "\n",
    "# Analyze the results\n",
    "print(\"Individual measurement results:\")\n",
    "for i, res in enumerate(results):\n",
    "    print(\n",
    "        f\"Shot {i}: {{{res.first_qubit}, {res.last_qubit}}}\\ttotal ones={res.total_ones}\"\n",
    "    )\n",
    "\n",
    "# Verify the Bell state correlations\n",
    "correlated_count = sum(\n",
    "    1 for res in results if res.first_qubit == res.last_qubit)\n",
    "print(\n",
    "    f\"\\nCorrelated measurements: {correlated_count}/{len(results)} ({correlated_count/len(results)*100:.1f}%)\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Run Async\n",
    "\n",
    "Similar to `sample_async` above, `run` also supports asynchronous execution for the [arguments it accepts](https://nvidia.github.io/cuda-quantum/latest/api/languages/python_api.html#cudaq.run_async)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> **NOTE:** Currently, `run` and `run_async` are only supported on simulator targets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Observe\n",
    "\n",
    "The `observe` function allows us to calculate expectation values. We must supply a spin operator in the form of a Hamiltonian, $H$,  from which we would like to calculate $\\bra{\\psi}H\\ket{\\psi}$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<H> = 0.0\n"
     ]
    }
   ],
   "source": [
    "from cudaq import spin\n",
    "\n",
    "# Define a Hamiltonian in terms of Pauli Spin operators.\n",
    "hamiltonian = spin.z(0) + spin.y(1) + spin.x(0) * spin.z(0)\n",
    "\n",
    "# Compute the expectation value given the state prepared by the kernel.\n",
    "result = cudaq.observe(kernel, hamiltonian, qubit_count).expectation()\n",
    "\n",
    "print('<H> =', result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Observe Async\n",
    "\n",
    "`observe` can be a time intensive task. We can parallelize the execution of `observe` via the [arguments it accepts](https://nvidia.github.io/cuda-quantum/latest/api/languages/python_api.html#cudaq.observe_async). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1.1102230246251565e-16\n"
     ]
    }
   ],
   "source": [
    "# Set the simulation target to a multi-QPU platform \n",
    "# cudaq.set_target(\"nvidia\", option = 'mqpu')\n",
    "\n",
    "# Measuring the expectation value of 2 different hamiltonians in parallel\n",
    "hamiltonian_1 = spin.x(0) + spin.y(1) + spin.z(0)*spin.y(1)\n",
    "# hamiltonian_2 = spin.z(1) + spin.y(0) + spin.x(1)*spin.x(0)\n",
    "\n",
    "# Asynchronous execution on multiple qpus via nvidia gpus.\n",
    "result_1 = cudaq.observe_async(kernel, hamiltonian_1, qubit_count, qpu_id=0)\n",
    "# result_2 = cudaq.observe_async(kernel, hamiltonian_2, qubit_count, qpu_id=1)\n",
    "\n",
    "# Retrieve results \n",
    "print(result_1.get().expectation())\n",
    "# print(result_2.get().expectation())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Above we parallelized the `observe` call over the `hamiltonian` parameter however we can parallelize over any of the argument it accepts by just iterating obver the `qpu_id`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "## Get state\n",
    "\n",
    "The `get_state` function gives us access to the quantum statevector of the computation. Remember, that this is only feasible in simulation mode. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.70710678+0.j 0.        +0.j 0.        +0.j 0.70710678+0.j]\n"
     ]
    }
   ],
   "source": [
    "# Compute the statevector of the kernel\n",
    "result = cudaq.get_state(kernel, qubit_count)\n",
    "\n",
    "print(np.array(result))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The statevector generated by the `get_state` command follows Big-endian convention for associating numbers with their binary representations, which places the least significant bit on the left.  That is, for the example of a 2-bit system, we have the following translation between integers and bits:\n",
    "$$\\begin{matrix} \\text{Integer} & \\text{Binary representation}\\\\\n",
    "& \\text{least signinificant bit on left}\\\\\n",
    "0 =\\textcolor{red}{0}*2^0+\\textcolor{blue}{0}*2^1 & \\textcolor{red}{0}\\textcolor{blue}{0} \\\\\n",
    "1 = \\textcolor{red}{1}*2^0 + \\textcolor{blue}{0} *2^1 & \\textcolor{red}{1}\\textcolor{blue}{0}\\\\\n",
    "2 = \\textcolor{red}{0}*2^0 + \\textcolor{blue}{1}*2^1 & \\textcolor{red}{0}\\textcolor{blue}{1} \\\\\n",
    "3 = \\textcolor{red}{1}*2^0 + \\textcolor{blue}{1}*2^1 & \\textcolor{red}{1}\\textcolor{blue}{1} \\end{matrix}\n",
    "$$\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Get State Async\n",
    "\n",
    "Similar to `observe_async` above, `get_state` also supports asynchronous execution for the [arguments it accepts](https://nvidia.github.io/cuda-quantum/latest/api/languages/python_api.html#cudaq.get_state_async)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "CUDA-Q Version proto-0.8.0-developer (https://github.com/NVIDIA/cuda-quantum cd3ef17fc8354e5e7428e3abd34f8d5e14c8b09a)\n"
     ]
    }
   ],
   "source": [
    "print(cudaq.__version__)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
